Ultrafast Comparison of Personal Genomes via Precomputed Genome Fingerprints

نویسندگان

  • Gustavo Glusman
  • Denise E. Mauldin
  • Leroy E. Hood
  • Max Robinson
چکیده

We present an ultrafast method for comparing personal genomes. We transform the standard genome representation (lists of variants relative to a reference) into "genome fingerprints" via locality sensitive hashing. The resulting genome fingerprints can be meaningfully compared even when the input data were obtained using different sequencing technologies, processed using different pipelines, represented in different data formats and relative to different reference versions. Furthermore, genome fingerprints are robust to up to 30% missing data. Because of their reduced size, computation on the genome fingerprints is fast and requires little memory. For example, we could compute all-against-all pairwise comparisons among the 2504 genomes in the 1000 Genomes data set in 67 s at high quality (21 μs per comparison, on a single processor), and achieved a lower quality approximation in just 11 s. Efficient computation enables scaling up a variety of important genome analyses, including quantifying relatedness, recognizing duplicative sequenced genomes in a set, population reconstruction, and many others. The original genome representation cannot be reconstructed from its fingerprint, effectively decoupling genome comparison from genome interpretation; the method thus has significant implications for privacy-preserving genome analytics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interactive gene-order comparison for multiple small genomes

UNLABELLED The Genome Organization Analysis Tool (GOAT) is a program that performs comparative sequence analysis on ordered gene lists from annotated genomes, provides visual and tabular output, and provides means of accessing and analyzing related gene sequence data, for the purpose of comparing genome organization at the gene-order level. GOAT can be used to compare any two or more genomes or...

متن کامل

MBGD: microbial genome database for comparative analysis

MBGD is a workbench system for comparative analysis of completely sequenced microbial genomes. The central function of MBGD is to create an orthologous gene classification table using precomputed all-against-all similarity relationships among genes in multiple genomes. In MBGD, an automated classification algorithm has been implemented so that users can create their own classification table by ...

متن کامل

Genome Wide Allele Frequency Fingerprints (GWAFFs) of Populations via Genotyping by Sequencing

Genotyping-by-Sequencing (GBS) is an excellent tool for characterising genetic variation between plant genomes. To date, its use has been reported only for genotyping of single individuals. However, there are many applications where resolving allele frequencies within populations on a genome-wide scale would be very powerful, examples include the breeding of outbreeding species, varietal protec...

متن کامل

Acquired Antimicrobial Resistance Genes of Escherichia coli Obtained from Nigeria: In silico Genome Analysis

Background: Antimicrobial resistance is a global problem with enormous public health and economic impact. This study was carried out to get an overview of acquired antimicrobial resistance gene sequences in the genomes of Escherichia coli isolated from different food sources and the environment in Nigeria. Methods: To determine the acquired antimicrobial-resistant genes prevalence, genome asse...

متن کامل

MolliGen, a database dedicated to the comparative genomics of Mollicutes

Bacteria belonging to the class Mollicutes were among the first ones to be selected for complete genome sequencing because of the minimal size of their genomes and their pathogenicity for humans and a broad range of animals and plants. At this time six genome sequences have been publicly released (Mycoplasma genitalium, Mycoplasma pneumoniae, Ureaplasma urealyticum-parvum, Mycoplasma pulmonis, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2017